Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

MCN: Modulated Convolutional Network

creased. In particular, to alleviate the disturbance caused by the binarized process, a center

loss is designed to incorporate the intraclass compactness with the quantization loss and

ﬁlter loss. The red arrows are used to show the back-propagation process. By considering

ﬁlter loss, center loss, and softmax loss in a uniﬁed framework, we achieve much better

performance than state-of-the-art binarized models. Most importantly, our MCNs model

is highly compressed and performs similarly to the well-known full-precision Resnets and

WideResnets.

As shown in Fig. 3.1, M-Filters and weights can be jointly optimized end-to-end, resulting

in a compact and portable learning architecture. Due to the low model complexity, such an

architecture is less prone to overﬁtting and is suitable for resource-constrained environments.

Speciﬁcally, our MCNs reduce the required storage space of a full-precision model by a

factor of 32 while achieving the best performance compared to the existing binarized ﬁlter-

based CNNs, even approximating full-precision ﬁlters. In addition, the number of model

parameters to be optimized is signiﬁcantly reduced, thus generating a computationally

eﬃcient CNNs model.

3.4.1

Forward Propagation with Modulation

We ﬁrst elaborate on the MCNs as vanilla BNNs with only binarized weight. We design

speciﬁc convolutional ﬁlters used in our MCNs. We deploy the 3D ﬁlter across all layers of

size K × W × W (one ﬁlter), which has K planes, and each of the planes is a W × W-sized

2D ﬁlter. To use such ﬁlters, we extend the input channels of the network, e.g., from RGB

to RRRR or (RGB+X) with K = 4 and X denotes any channel. Note that we only use

one channel of gray-level images. Doing so allows us to implement our MCNs in existing

deep-learning platforms quickly. After this extension, we directly deploy our ﬁlters in the

convolution process, whose details concerning the MCNs convolution are illustrated in Fig.

3.2(b).

To reconstruct unbinarized ﬁlters, we introduce a modulated process based on M-Filters

and binarized ﬁlters. An M-Filter is a matrix that serves as the weight of binarized ﬁlters,

which is also the size of K × W × W. Let Mj be the j-th plane of an M-Filter. We deﬁne

the operation ⊚for a given layer as follows:

ˆCi ◦M =

ˆCi ∗M

′

j^,

(3.12)

where M

′

j ^{= (}^M^j^{, ..., M}^j^{) is a 3D matrix built based on}^K^{copies of the 2D matrix}^M^j^with

j = 1, ..., K. ∗is the element-wise multiplication operator, also termed the Schur product

operation. In Eq. 3.12, M is a learned weight matrix used to reconstruct convolutional ﬁlters

Ci based on ^ˆCi and the operation ◦. And it leads to the ﬁlter loss in Eq. 3.18. An example

of ﬁlter modulation is shown in Fig. 3.2(a). In addition, the operation ◦results in a new

matrix (named reconstructed ﬁlter), i.e., ^ˆCi ∗M

′

j^{, which is elaborated in the following. We}

deﬁne:

Qij = ^ˆCi ∗M

′

j^,

(3.13)

Qi = {Qi1, ..., QiK}.

(3.14)

In testing, Qi is not predeﬁned but is calculated based on Eq. 3.13. An example is shown

in Fig. 3.2(a). Qi is introduced to approximate the unbinarized ﬁlters wi to alleviate the

information loss problem caused by the binarized process. In addition, we further require

M ≥0 to simplify the reconstructed process.